Northwest Arctic Borough
Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
Hamdani, Rajaa El, Haffoudhi, Samy, Holzenberger, Nils, Suchanek, Fabian, Bonald, Thomas, Malliaros, Fragkiskos D.
Language models (LMs) encode substantial factual knowledge, but often produce answers judged as incorrect. We hypothesize that many of these answers are actually correct, but are expressed in alternative surface forms that are dismissed due to an overly strict evaluation, leading to an underestimation of models' parametric knowledge. We propose Retrieval-Constrained Decoding (RCD), a decoding strategy that restricts model outputs to unique surface forms. We introduce YAGO-QA, a dataset of 19,137 general knowledge questions. Evaluating open-source LMs from 135M to 70B parameters, we show that standard decoding undervalues their knowledge. For instance, Llama-3.1-70B scores only 32.3% F1 with vanilla decoding but 46.0% with RCD. Similarly, Llama-3.1-8B reaches 33.0% with RCD, outperforming the larger model under vanilla decoding. We publicly share the code and dataset at https://github.com/Rajjaa/disambiguated-LLM.
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > China > Hong Kong (0.04)
- (17 more...)
- Media (1.00)
- Aerospace & Defense (0.93)
- Leisure & Entertainment > Sports > Soccer (0.69)
Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning
Lin, Yijun, Chen, Theresa, Brungard, Colby, Sabine, Grunwald, Ives, Sue, Macander, Matt, Nawrocki, Timm, Chiang, Yao-Yi, Jelinski, Nic
Fine-scale soil mapping in Alaska, traditionally relying on fieldwork and localized simulations, remains a critical yet underdeveloped task, despite the region's ecological importance and extensive permafrost coverage. As permafrost thaw accelerates due to climate change, it threatens infrastructure stability and key ecosystem services, such as soil carbon storage. High-resolution soil maps are essential for characterizing permafrost distribution, identifying vulnerable areas, and informing adaptation strategies. We present MISO, a vision-based machine learning (ML) model to produce statewide fine-scale soil maps for near-surface permafrost and soil taxonomy. The model integrates a geospatial foundation model for visual feature extraction, implicit neural representations for continuous spatial prediction, and contrastive learning for multimodal alignment and geo-location awareness. We compare MISO with Random Forest (RF), a traditional ML model that has been widely used in soil mapping applications. Spatial cross-validation and regional analysis across Permafrost Zones and Major Land Resource Areas (MLRAs) show that MISO generalizes better to remote, unseen locations and achieves higher recall than RF, which is critical for monitoring permafrost thaw and related environmental processes. These findings demonstrate the potential of advanced ML approaches for fine-scale soil mapping and provide practical guidance for future soil sampling and infrastructure planning in permafrost-affected landscapes. The project will be released at https://github.com/knowledge-computing/Peatland-permafrost.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Alaska > Fairbanks North Star Borough > Fairbanks (0.14)
- (10 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Liu, Fuxiao, Tan, Hao, Tensmeyer, Chris
Vision-language pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text. While existing vision-language pretraining models primarily focus on understanding single image associated with a single piece of text, they often ignore the alignment at the intra-document level, consisting of multiple sentences with multiple images. In this work, we propose DocumentCLIP, a salience-aware contrastive learning framework to enforce vision-language pretraining models to comprehend the interaction between images and longer text within documents. Our model is beneficial for the real-world multimodal document understanding like news article, magazines, product descriptions, which contain linguistically and visually richer content. To the best of our knowledge, we are the first to explore multimodal intra-document links by contrastive learning. In addition, we collect a large Wikipedia dataset for pretraining, which provides various topics and structures. Experiments show DocumentCLIP not only outperforms the state-of-the-art baselines in the supervised setting, but also achieves the best zero-shot performance in the wild after human evaluation. Our code is available at https://github.com/FuxiaoLiu/DocumentCLIP.
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > United States > Alaska > Northwest Arctic Borough > Kotzebue (0.04)
- North America > United States > Alaska > North Slope Borough > Utqiagvik (0.04)
- (8 more...)
- Health & Medicine (1.00)
- Government (0.68)
Enhancing Supply Chain Resilience: A Machine Learning Approach for Predicting Product Availability Dates Under Disruption
Camur, Mustafa Can, Ravi, Sandipp Krishnan, Saleh, Shadi
The COVID 19 pandemic and ongoing political and regional conflicts have a highly detrimental impact on the global supply chain, causing significant delays in logistics operations and international shipments. One of the most pressing concerns is the uncertainty surrounding the availability dates of products, which is critical information for companies to generate effective logistics and shipment plans. Therefore, accurately predicting availability dates plays a pivotal role in executing successful logistics operations, ultimately minimizing total transportation and inventory costs. We investigate the prediction of product availability dates for General Electric (GE) Gas Power's inbound shipments for gas and steam turbine service and manufacturing operations, utilizing both numerical and categorical features. We evaluate several regression models, including Simple Regression, Lasso Regression, Ridge Regression, Elastic Net, Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network models. Based on real world data, our experiments demonstrate that the tree based algorithms (i.e., RF and GBM) provide the best generalization error and outperforms all other regression models tested. We anticipate that our prediction models will assist companies in managing supply chain disruptions and reducing supply chain risks on a broader scale.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Mississippi (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (8 more...)
- Transportation > Freight & Logistics Services (1.00)
- Energy (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)
A Framework for Flexible Peak Storm Surge Prediction
Pachev, Benjamin, Arora, Prateek, del-Castillo-Negrete, Carlos, Valseth, Eirik, Dawson, Clint
Storm surge is a major natural hazard in coastal regions, responsible both for significant property damage and loss of life. Accurate, efficient models of storm surge are needed both to assess long-term risk and to guide emergency management decisions. While high-fidelity regional- and global-ocean circulation models such as the ADvanced CIRCulation (ADCIRC) model can accurately predict storm surge, they are very computationally expensive. Here we develop a novel surrogate model for peak storm surge prediction based on a multi-stage approach. In the first stage, points are classified as inundated or not. In the second, the level of inundation is predicted . Additionally, we propose a new formulation of the surrogate problem in which storm surge is predicted independently for each point. This allows for predictions to be made directly for locations not present in the training data, and significantly reduces the number of model parameters. We demonstrate our modeling framework on two study areas: the Texas coast and the northern portion of the Alaskan coast. For Texas, the model is trained with a database of 446 synthetic hurricanes. The model is able to accurately match ADCIRC predictions on a test set of synthetic storms. We further present a test of the model on Hurricanes Ike (2008) and Harvey (2017). For Alaska, the model is trained on a dataset of 109 historical surge events. We test the surrogate model on actual surge events including the recent Typhoon Merbok (2022) that take place after the events in the training data. For both datasets, the surrogate model achieves similar performance to ADCIRC on real events when compared to observational data. In both cases, the surrogate models are many orders of magnitude faster than ADCIRC.
- North America > United States > Alaska > Nome Census Area > Nome (0.14)
- North America > Mexico (0.04)
- Asia > Taiwan (0.04)
- (16 more...)
Alaska Schools Get Faster Internet--Partly Thanks to Global Warming
Before they got down to business for the day, students in Devin Tatro's social studies class were offered a quiet moment of self-reflection: On this golden fall afternoon at Nome-Beltz Junior/Senior High School, were they feeling chipper, distressed, or somewhere in between? One by one, they selected the picture of the facial expression that best matched their mood, and with a swift click sent an answer to the teacher. She scanned the responses and made a few mental notes. Then, without missing a beat, she switched the smartboard display and launched into a multiple-choice quiz using a game-based online learning platform called Kahoot! "Tell me one thing you remember about yesterday's lesson on expansions and tax on Native Americans," Tatro said, pacing the front of the classroom. She rattled off students' responses as they popped up on the smartboard in a colorful word cloud: "Forced relocation, reduced population, disease, warfare, cultural destruction ... wow, that's a powerful term."
- North America > United States > Alaska > Northwest Arctic Borough (0.06)
- North America > United States > Alaska > North Slope Borough > Prudhoe Bay (0.05)
- North America > United States > Virginia (0.05)
- (6 more...)
- Education > Educational Setting > Online (0.49)
- Education > Educational Setting > K-12 Education > Secondary School (0.36)
Gaussian Process Regression for Arctic Coastal Erosion Forecasting
Kupilik, Matthew, Witmer, Frank, MacLeod, Euan-Angus, Wang, Caixia, Ravens, Tom
Arctic coastal morphology is governed by multiple factors, many of which are affected by climatological changes. As the season length for shorefast ice decreases and temperatures warm permafrost soils, coastlines are more susceptible to erosion from storm waves. Such coastal erosion is a concern, since the majority of the population centers and infrastructure in the Arctic are located near the coasts. Stakeholders and decision makers increasingly need models capable of scenario-based predictions to assess and mitigate the effects of coastal morphology on infrastructure and land use. Our research uses Gaussian process models to forecast Arctic coastal erosion along the Beaufort Sea near Drew Point, AK. Gaussian process regression is a data-driven modeling methodology capable of extracting patterns and trends from data-sparse environments such as remote Arctic coastlines. To train our model, we use annual coastline positions and near-shore summer temperature averages from existing datasets and extend these data by extracting additional coastlines from satellite imagery. We combine our calibrated models with future climate models to generate a range of plausible future erosion scenarios. Our results show that the Gaussian process methodology substantially improves yearly predictions compared to linear and nonlinear least squares methods, and is capable of generating detailed forecasts suitable for use by decision makers.
- Arctic Ocean > Beaufort Sea (0.25)
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.14)
- North America > United States > Alaska > Northwest Arctic Borough > Arctic (0.14)
- (4 more...)
Exchangeable Random Measures for Sparse and Modular Graphs with Overlapping Communities
Todeschini, Adrien, Miscouridou, Xenia, Caron, François
A network is composed of a set of nodes, or vertices, with connections between them. Network data arise in a wide range of fields, and include social networks, collaboration networks, communication networks, biological networks, food webs and are a useful way of representing interactions between sets of objects. Of particular importance is the elaboration of random graph models, which can capture the salient properties of real-world graphs. Following the seminal work of Erd os and R enyi (1959), various network models have been proposed; see the overviews of Newman (2003b, 2009), Kolaczyk (2009), Bollob as (2001), Goldenberg et al. (2010), Fienberg (2012) or Jacobs and Clauset (2014). In particular, a large body of the literature has concentrated on models that can capture some modular or community structure within the network. The first statistical network model in this line of research is the popular stochastic block-model (Holland et al., 1983; Snijders and Nowicki, 1997; Nowicki and Snijders, 2001). The stochastic block-model assumes that each node belongs to one ofp latent communities, and the probability of connection between two nodes is given by ap p connectivity matrix. This model has been extended in various directions, by introducing degree-correction parameters (Karrer and Newman, 2011), by allowing the number of communities to grow with the size of the network (Kemp et al., 2006), or by considering overlapping communities (Airoldi et al., 2008; Miller et al., 2009; Latouche et al., 2011; Palla et al., 2012; Yang and Leskovec, 2013). Stochastic block-models and their extensions have shown to offer a very flexible modeling framework, with interpretable parameters, and have been successfully used for the analysis of numerous real-world networks.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (35 more...)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)